COCA Filters: Co-occurrence Aware Bloom Filters
نویسندگان
چکیده
We propose an indexing data structure based on a novel variation of Bloom filters. Signature files have been proposed in the past as a method to index large text databases though they suffer from a high false positive error problem. In this paper we introduce COCA Filters, a new type of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We show experimentally that by using this technique we can reduce the false positive error by up to 21 times for the same index size. Furthermore Bloom filters can be replaced by COCA filters wherever the co-occurrence of any two members of the universe is identifiable.
منابع مشابه
An Approximate Duplicate-Elimination in RFID Data Streams Based on d-Left Time Bloom Filter
Article history: Received 6 March 2010 Received in revised form 16 July 2011 Accepted 18 July 2011 Available online 31 July 2011 The RFID technology has been applied to a wide range of areas since it does not require contact in detecting RFID tags. However, due to the multiple readings in many cases in detecting an RFID tag and the deployment of multiple readers, RFID data contains many duplica...
متن کاملBloofi: Multidimensional Bloom Filters
Bloom filters are probabilistic data structures commonly used for approximate membership problems in many areas of Computer Science (networking, distributed systems, databases, etc.). With the increase in data size and distribution of data, problems arise where a large number of Bloom filters are available, and all them need to be searched for potential matches. As an example, in a federated cl...
متن کاملProposals of Co-occurrence Frequency Image Based Filters
We have discussed that the co-occurrence frequency image (CFI) defined based on the co-occurrence frequency histogram of the gray value of an image has a potential to introduce a new scheme for image feature extraction. This paper proposes a couple of filters for image enhancements of such as sharpening and smoothing filters. These filters are very similar to but quite different from those whic...
متن کاملOptimizing Learned Bloom Filters by Sandwiching
We provide a simple method for improving the performance of the recently introduced learned Bloom filters, by showing that they perform better when the learned function is sandwiched between two Bloom filters.
متن کاملBloom-Based Filters for Hierarchical Data1
In this paper, we present two novel hash-based indexing structures, based on Bloom filters, called breadth and depth Bloom filters, which in contrast to traditional hash based indexes, are able to represent hierarchical data and support path expression queries. We describe how these structures can be used for resource discovery in peer-to-peer networks. We have implemented both structures and o...
متن کامل